Investigating the Behaviour Of

نویسندگان

  • J Wyatt
  • G Hayes
  • J Hallam
چکیده

There is a spectrum of methods for learning robot control. At one end there are model-free methods (eg. Q-learning, AHC, bucket brigade), and at the other there are model-based methods, (eg. dynamic programming by value or policy iteration). The advantage of one technique is the weakness of the other. Model-based methods use experience eeectively, but are computationally expensive; model-free methods are cheap computationally, but require an order of magnitude more experience. In the middle ground there are now methods like Q-DYNA 4] and Prioritised Sweeping 1] which use a learned model to speed temporal credit assignment. The optimal trade-oo is dependent on the relative cost of experience and computation for particular tasks. Unfortunately we frequently do not know the cost balance for a particular task. Hence the ultimate goal of this work is to understand more about the sorts of methods that might work well on a wide variety of cost ratios, and in particular how model free methods might be extended. In this paper we examine the behaviour of one such model-free algorithm, Q() 2]. This algorithm shows promise because it combines the best features of Sutton's TD() algorithm 5] with those of Watkins Q-learning 6]. Despite being a model-free algorithm, it has been reported to outperform Prioritised Sweeping, the current best method for learning a policy and a model at the same time. Here we look at the eeect on its performance of using replacing or accumulating traces, and at the problem of exploration sensitivity. 1 t = 0; ^ Q(x; a) = 0 and T r(x; a) = 0; 8x; a 2 a t = arg max a (^ Q(x t ; a)) 3 e 0 t = r t + ^ V t (x t+1) ? ^ Q t (x t ; a t) e t = r t + ^ V t (x t+1) ? ^ V t (x t) 4 8x; a do T r(x; a) = T r(x; a) ^ Q t+1 (x; a) = ^ Q t (x; a) + T r(x; a)e t 5 ^ Q t+1 (x t ; a t) = ^ Q t (x t ; a t) + e 0 t 6 T r(x t ; a t) = T r(x t ; a t) + 1 7 step 2 x t is the state at time t. a t is the action chosen at time t. …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effect of Underlying Fabric on the Bagging Behaviour of Denim Fabrics (RESEARCH NOTE)

Underlying fabrics can change the appearance, function and quality of the garment, and also add so much longevity of the garment. Nowadays, with the increasing use of various types of fabrics in the garment industry, their resistance to bagging is of great importance with the aim of determining the effectiveness of textiles under various forces. The current paper investigated the effect of unde...

متن کامل

Evaluation of modification of behaviour of the pregnant women in the field of urinary infections based on the health belief mode

Introduction: One of the most common problems among women during pregnancy is Urinary Infection (UI). Pregnant women are highly susceptible to UI due to body changes and because of its potential complications on mothers and their fetuses, UI receives particular attention.The current study aimed at investigating the modifiability of behaviour of the pregnant women in the field of urinary infecti...

متن کامل

Assessing the role of self efficacy and social tendencies in green purchase intention and behaviour

The present study was carried out aimed to investigate the role of self efficacy and social tendencies in green purchase intention and behavior. This study is considered as an applied research in terms of purpose and descriptive-survey research in terms of data collection method. In this study, the statistical population includes all consumers of organic products supplied at Ofogh Koorosh chain...

متن کامل

Stimulation Behaviour Study on Clay Treated with Ground Granulated Blast Slag and Groundnutshell Ash

The major decision in construction process involves the selection of suitable site with best soil conditions, as structure resides in the soil. Most problematic soils like expansive soils hardly proved to be the best engineering subgrade profile for pavement constructions. Thus, this has undeniably led to the soil improvement options accompanying the reduction in resource depletion and solid wa...

متن کامل

Titanium and Fluoride Co-substitution in Hydroxyl Apatite

Titanium and fluoride-containing hydroxyl apatite were synthesized through precipitation method following by a hydrothermal stage at 100oC for 6 hours. XRD analysis of the sample scalcinedat650oC for 1 hour revealed that all samples have pure apatite structure. The existence of Fluoride substitution in apatite structure was confirmed by FTIR(Fourier Transform Infrared Spectroscopy) analysis. Su...

متن کامل

A survey about individuals motives of participation intention in sports activity (Experimental test of the theory of planned behaviour)

Purpose: In this study, the theory of planned behaviour was used to better understand participation in sport. Method: This study used survey & cross-sectional method. Data were gathered from 360 student's university of Yazd. The collection instrument of data was planned behaviour scale revised. Results: Results from the empirical test of the theory of planned behaviour with students participati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996